105 research outputs found

    RNA Accessibility in cubic time

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The accessibility of RNA binding motifs controls the efficacy of many biological processes. Examples are the binding of miRNA, siRNA or bacterial sRNA to their respective targets. Similarly, the accessibility of the Shine-Dalgarno sequence is essential for translation to start in prokaryotes. Furthermore, many classes of RNA binding proteins require the binding site to be single-stranded.</p> <p>Results</p> <p>We introduce a way to compute the accessibility of all intervals within an RNA sequence in <inline-formula><graphic file="1748-7188-6-3-i1.gif"/></inline-formula>(<it>n</it><sup>3</sup>) time. This improves on previous implementations where only intervals of one defined length were computed in the same time. While the algorithm is in the same efficiency class as sampling approaches, the results, especially if the probabilities get small, are much more exact.</p> <p>Conclusions</p> <p>Our algorithm significantly speeds up methods for the prediction of RNA-RNA interactions and other applications that require the accessibility of RNA molecules. The algorithm is already available in the program RNAplfold of the ViennaRNA package.</p

    Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

    Get PDF
    Background: The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results: We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n^4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions: The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations.Human Evolutionary Biolog

    RNA secondary structure prediction from multi-aligned sequences

    Full text link
    It has been well accepted that the RNA secondary structures of most functional non-coding RNAs (ncRNAs) are closely related to their functions and are conserved during evolution. Hence, prediction of conserved secondary structures from evolutionarily related sequences is one important task in RNA bioinformatics; the methods are useful not only to further functional analyses of ncRNAs but also to improve the accuracy of secondary structure predictions and to find novel functional RNAs from the genome. In this review, I focus on common secondary structure prediction from a given aligned RNA sequence, in which one secondary structure whose length is equal to that of the input alignment is predicted. I systematically review and classify existing tools and algorithms for the problem, by utilizing the information employed in the tools and by adopting a unified viewpoint based on maximum expected gain (MEG) estimators. I believe that this classification will allow a deeper understanding of each tool and provide users with useful information for selecting tools for common secondary structure predictions.Comment: A preprint of an invited review manuscript that will be published in a chapter of the book `Methods in Molecular Biology'. Note that this version of the manuscript may differ from the published versio

    An analysis of simple computational strategies to facilitate the design of functional molecular information processors

    Get PDF
    BACKGROUND: Biological macromolecules (DNA, RNA and proteins) are capable of processing physical or chemical inputs to generate outputs that parallel conventional Boolean logical operators. However, the design of functional modules that will enable these macromolecules to operate as synthetic molecular computing devices is challenging. RESULTS: Using three simple heuristics, we designed RNA sensors that can mimic the function of a seven-segment display (SSD). Ten independent and orthogonal sensors representing the numerals 0 to 9 are designed and constructed. Each sensor has its own unique oligonucleotide binding site region that is activated uniquely by a specific input. Each operator was subjected to a stringent in silico filtering. Random sensors were selected and functionally validated via ribozyme self cleavage assays that were visualized via electrophoresis. CONCLUSIONS: By utilising simple permutation and randomisation in the sequence design phase, we have developed functional RNA sensors thus demonstrating that even the simplest of computational methods can greatly aid the design phase for constructing functional molecular devices. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1297-x) contains supplementary material, which is available to authorized users

    Multilevel Selection in Models of Prebiotic Evolution II: A Direct Comparison of Compartmentalization and Spatial Self-Organization

    Get PDF
    Multilevel selection has been indicated as an essential factor for the evolution of complexity in interacting RNA-like replicator systems. There are two types of multilevel selection mechanisms: implicit and explicit. For implicit multilevel selection, spatial self-organization of replicator populations has been suggested, which leads to higher level selection among emergent mesoscopic spatial patterns (traveling waves). For explicit multilevel selection, compartmentalization of replicators by vesicles has been suggested, which leads to higher level evolutionary dynamics among explicitly imposed mesoscopic entities (protocells). Historically, these mechanisms have been given separate consideration for the interests on its own. Here, we make a direct comparison between spatial self-organization and compartmentalization in simulated RNA-like replicator systems. Firstly, we show that both mechanisms achieve the macroscopic stability of a replicator system through the evolutionary dynamics on mesoscopic entities that counteract that of microscopic entities. Secondly, we show that a striking difference exists between the two mechanisms regarding their possible influence on the long-term evolutionary dynamics, which happens under an emergent trade-off situation arising from the multilevel selection. The difference is explained in terms of the difference in the stability between self-organized mesoscopic entities and externally imposed mesoscopic entities. Thirdly, we show that a sharp transition happens in the long-term evolutionary dynamics of the compartmentalized system as a function of replicator mutation rate. Fourthly, the results imply that spatial self-organization can allow the evolution of stable folding in parasitic replicators without any specific functionality in the folding itself. Finally, the results are discussed in relation to the experimental synthesis of chemical Darwinian systems and to the multilevel selection theory of evolutionary biology in general. To conclude, novel evolutionary directions can emerge through interactions between the evolutionary dynamics on multiple levels of organization. Different multilevel selection mechanisms can produce a difference in the long-term evolutionary trend of identical microscopic entities

    Directed acyclic graph kernels for structural RNA analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent discoveries of a large variety of important roles for non-coding RNAs (ncRNAs) have been reported by numerous researchers. In order to analyze ncRNAs by kernel methods including support vector machines, we propose stem kernels as an extension of string kernels for measuring the similarities between two RNA sequences from the viewpoint of secondary structures. However, applying stem kernels directly to large data sets of ncRNAs is impractical due to their computational complexity.</p> <p>Results</p> <p>We have developed a new technique based on directed acyclic graphs (DAGs) derived from base-pairing probability matrices of RNA sequences that significantly increases the computation speed of stem kernels. Furthermore, we propose profile-profile stem kernels for multiple alignments of RNA sequences which utilize base-pairing probability matrices for multiple alignments instead of those for individual sequences. Our kernels outperformed the existing methods with respect to the detection of known ncRNAs and kernel hierarchical clustering.</p> <p>Conclusion</p> <p>Stem kernels can be utilized as a reliable similarity measure of structural RNAs, and can be used in various kernel-based applications.</p

    Efficient Algorithms for Probing the RNA Mutation Landscape

    Get PDF
    The diversity and importance of the role played by RNAs in the regulation and development of the cell are now well-known and well-documented. This broad range of functions is achieved through specific structures that have been (presumably) optimized through evolution. State-of-the-art methods, such as McCaskill's algorithm, use a statistical mechanics framework based on the computation of the partition function over the canonical ensemble of all possible secondary structures on a given sequence. Although secondary structure predictions from thermodynamics-based algorithms are not as accurate as methods employing comparative genomics, the former methods are the only available tools to investigate novel RNAs, such as the many RNAs of unknown function recently reported by the ENCODE consortium. In this paper, we generalize the McCaskill partition function algorithm to sum over the grand canonical ensemble of all secondary structures of all mutants of the given sequence. Specifically, our new program, RNAmutants, simultaneously computes for each integer k the minimum free energy structure MFE(k) and the partition function Z(k) over all secondary structures of all k-point mutants, even allowing the user to specify certain positions required not to mutate and certain positions required to base-pair or remain unpaired. This technically important extension allows us to study the resilience of an RNA molecule to pointwise mutations. By computing the mutation profile of a sequence, a novel graphical representation of the mutational tendency of nucleotide positions, we analyze the deleterious nature of mutating specific nucleotide positions or groups of positions. We have successfully applied RNAmutants to investigate deleterious mutations (mutations that radically modify the secondary structure) in the Hepatitis C virus cis-acting replication element and to evaluate the evolutionary pressure applied on different regions of the HIV trans-activation response element. In particular, we show qualitative agreement between published Hepatitis C and HIV experimental mutagenesis studies and our analysis of deleterious mutations using RNAmutants. Our work also predicts other deleterious mutations, which could be verified experimentally. Finally, we provide evidence that the 3′ UTR of the GB RNA virus C has been optimized to preserve evolutionarily conserved stem regions from a deleterious effect of pointwise mutations. We hope that there will be long-term potential applications of RNAmutants in de novo RNA design and drug design against RNA viruses. This work also suggests potential applications for large-scale exploration of the RNA sequence-structure network. Binary distributions are available at http://RNAmutants.csail.mit.edu/

    AUG_hairpin: prediction of a downstream secondary structure influencing the recognition of a translation start site

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The translation start site plays an important role in the control of translation efficiency of eukaryotic mRNAs. The recognition of the start AUG codon by eukaryotic ribosomes is considered to depend on its nucleotide context. However, the fraction of eukaryotic mRNAs with the start codon in a suboptimal context is relatively large. It may be expected that mRNA should possess some features providing efficient translation, including the proper recognition of a translation start site. It has been experimentally shown that a downstream hairpin located in certain positions with respect to start codon can compensate in part for the suboptimal AUG context and also increases translation from non-AUG initiation codons. Prediction of such a compensatory hairpin may be useful in the evaluation of eukaryotic mRNA translation properties.</p> <p>Results</p> <p>We evaluated interdependency between the start codon context and mRNA secondary structure at the CDS beginning: it was found that a suboptimal start codon context significantly correlated with higher base pairing probabilities at positions 13 – 17 of CDS of human and mouse mRNAs. It is likely that the downstream hairpins are used to enhance translation of some mammalian mRNAs <it>in vivo</it>. Thus, we have developed a tool, <it>AUG_hairpin</it>, to predict local stem-loop structures located within the defined region at the beginning of mRNA coding part. The implemented algorithm is based on the available published experimental data on the CDS-located stem-loop structures influencing the recognition of upstream start codons.</p> <p>Conclusion</p> <p>An occurrence of a potential secondary structure downstream of start AUG codon in a suboptimal context (or downstream of a potential non-AUG start codon) may provide researchers with a testable assumption on the presence of additional regulatory signal influencing mRNA translation initiation rate and the start codon choice. <it>AUG_hairpin</it>, which has a convenient Web-interface with adjustable parameters, will make such an evaluation easy and efficient.</p

    Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structural alignment of RNAs is becoming important, since the discovery of functional non-coding RNAs (ncRNAs). Recent studies, mainly based on various approximations of the Sankoff algorithm, have resulted in considerable improvement in the accuracy of pairwise structural alignment. In contrast, for the cases with more than two sequences, the practical merit of structural alignment remains unclear as compared to traditional sequence-based methods, although the importance of multiple structural alignment is widely recognized.</p> <p>Results</p> <p>We took a different approach from a straightforward extension of the Sankoff algorithm to the multiple alignments from the viewpoints of accuracy and time complexity. As a new option of the MAFFT alignment program, we developed a multiple RNA alignment framework, X-INS-i, which builds a multiple alignment with an iterative method incorporating structural information through two components: (1) pairwise structural alignments by an external pairwise alignment method such as SCARNA or LaRA and (2) a new objective function, Four-way Consistency, derived from the base-pairing probability of every sub-aligned group at every multiple alignment stage.</p> <p>Conclusion</p> <p>The BRAliBASE benchmark showed that X-INS-i outperforms other methods currently available in the sum-of-pairs score (SPS) criterion. As a basis for predicting common secondary structure, the accuracy of the present method is comparable to or rather higher than those of the current leading methods such as RNA Sampler. The X-INS-i framework can be used for building a multiple RNA alignment from any combination of algorithms for pairwise RNA alignment and base-pairing probability. The source code is available at the webpage found in the Availability and requirements section.</p

    Prediction of RNA secondary structure by maximizing pseudo-expected accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.</p> <p>Results</p> <p>Instead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the <it>pseudo</it>-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.</p> <p>Conclusions</p> <p>This study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.</p
    corecore